A New Concise and Lossless Representation of Frequent Itemsets Using Generators and A Positive Border
نویسندگان
چکیده
A complete set of frequent itemsets can get undesirably large due to redundancy when the minimum support threshold is low or when the database is dense. Several concise representations have been proposed to eliminate the redundancy. Existing generator based representations rely on a negative border to make the representation lossless. However, negative borders of generators are often very large. The number of itemsets on a negative border sometimes even exceeds the total number of frequent itemsets. In this paper, we propose to use a positive border together with frequent generators to form a lossless representation. A positive border is usually orders of magnitude smaller than its corresponding negative border. A set of frequent generators plus its positive border is always no larger than the corresponding complete set of frequent itemsets, thus it is a true concise representation. The generalized form of this representation is also proposed. We develop an efficient algorithm, called GrGrowth, to mine generators and positive borders as well as their generalizations. The GrGrowth algorithm uses the depth-first-search strategy to explore the search space, which is much more efficient than the breadth-first-search strategy adopted by most of the existing generator mining algorithms. Our experiment results show that the GrGrowth algorithm is significantly faster than level-wise algorithms for mining generator based representations, and is comparable to the state-of-the-art algorithms for mining frequent closed itemsets.
منابع مشابه
Positive Borders or Negative Borders: How to Make Lossless Generator Based Representations Concise
A complete set of frequent itemsets can get undesirably large due to redundancy. Several representations have been proposed to eliminate the redundancy. Existing generator based representations rely on a negative border to make the representation lossless. However, negative borders of generators are often very large. The number of itemsets on a negative border sometimes even exceeds the total n...
متن کاملNon-Derivable Item Set and Non-Derivable Literal Set Representations of Patterns Admitting Negation
The discovery of frequent patterns has attracted a lot of attention of the data mining community. While an extensive research has been carried out for discovering positive patterns, little has been offered for discovering patterns with negation. The main hindrance to the progress of such research is huge amount of frequent patterns with negation, which exceeds the number of frequent positive pa...
متن کاملSimultaneous mining of frequent closed itemsets and their generators: Foundation and algorithm
Closed itemsets and their generators play an important role in frequent itemset and association rule mining. They allow a lossless representation of all frequent itemsets and association rules and facilitate mining. Some recent approaches discover frequent closed itemsets and generators separately. The Close algorithm mines them simultaneously but it needs to scan the database many times. Based...
متن کاملExploring the Disjunctive Search Space towards Discovering New Exact Concise Representations for Frequent Patterns
Extracting concise representations seems to be a milestone towards the emerging knowledge extraction field. In fact, it is a quite survival reflex towards providing a manageably-sized and reliable knowledge. Thus, we bashfully witness the emergence of a trend towards extracting concise representations, e.g., closed patterns, non-derivable patterns and essential patterns. The essential pattern-b...
متن کاملNegative Generator Border for Effective Pattern Maintenance
In this paper, we study the maintenance of frequent patterns in the context of the generator representation. The generator representation is a concise and lossless representation of frequent patterns. We effectively maintain the generator representation by systematically expanding its Negative Generator Border. According to our literature review, no prior work has studied the maintenance of the...
متن کامل